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Geometric remarks on 
Kalman filtering with intermittent observations 

Andrea Censi 

Abstract 

Sinopoli et al. (TAC, 2004) considered the problem of optimal estimation for linear systems with 
Gaussian noise and intermittent observations, available according to a Bernoulli arrival process. They 
showed that there is a "critical" arrival probability of the observations, such that under that threshold 
the expected value of the covariance matrix (i.e., the quadratic error) of the estimate is unbounded. 
Sinopoli et al, and successive authors, interpreted this result implying that the behavior of the system 
is qualitatively different above and below the threshold. This paper shows that this is not necessarily 
the only interpretation. In fact, the critical probability is different if one considers the average error 
instead of the average quadratic error. More generally, finding a meaningful "average" covariance is 
not as simple as taking the algebraic expected value. A rigorous way to frame the problem is in a 
differential geometric framework, by recognizing that the set of covariance matrices (or better, the 
manifold of Gaussian distributions) is not a flat space, and then studying the intrinsic Riemannian 
mean. Several metrics on this manifold are considered that lead to different critical probabilities, or no 
critical probability at all. 

I. Introduction 

The Kalman filter was conceived in the 1960s [1] and found immediate use at the forefront 
of engineering [2]. For the successive decades, the state-space approach of the Kalman filter 
was the tool of choice for many filtering and tracking problems, both in its algebraically 
equivalent formulations (e.g.. Information filter [3], square root and "array" algorithms [4]) 
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and its extensions to nonlinear problems (e.g., extended Kalman filter (EKF), unscented Kalman 
filter), only recently giving way to Monte Carlo methods (particle filters). 

In recent years, in many engineering fields, estimation problems have been considered where 
the availability or observations, or their structure, is subject to random phenomena, and one is 
interested in characterizing the "average accuracy". For example, in robotics, the EKF is used in 
problems such as Simultaneous Localization and Mapping (SLAM); the observations structure 
depends on the landmark configuration, which is unknown a priori, yet it is of interest to study 
average accuracy results [5]. 

In the control literature, random observations can model packet drops, which is one of the 
important phenomena in network-based estimation and control. Sinopoli et al. [6] considered the 
problem of Kalman filtering when the observations are available intermittently with Bernoulli 
probability. They showed that there exists a critical value of the arrival probability such that, 
under that threshold, the expected value of the error covariance matrix is unbounded. Other 
successive papers improved on the same results by better characterizing the critical probability 
or considering non-independent packet drops [7]-[10]. 

The way the result of Sinopoli et al. is often interpreted is that the system has a qualitatively 
different behavior above and below the critical probability. The purpose of this note is to show 
that this is not necessarily the only interpretation. A motivating example is given in Section [III 
Considering the expected value of the covariance is equivalent to considering the expected value 
of the squared error norm E{||e||^}. If one instead considers the error norm E{||e||2}, which 
is equivalent to considering the expected value of the standard deviation, a different — and 
lower — critical probability is obtained. This raises doubts about the significance of Sinopoli 
et aZ.'s critical probability. More generally, what is critical is the way one defines the "average" 
uncertainty. Because the operation of expected value is not invariant to change of coordinates, 
the result is different if one averages the covariances, the standard deviations, or the information 
matrices: in general, IE{P} 7^ E{-\/P}^ 7^ E{P^^}^^. This paper advocates a geometric point of 
view. The basic assumption is that covariance matrices are only a particular choice of coordinates 
to represent Gaussian distributions, which is a Riemannian manifold with a very rich structure. 
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Section |lll] deals with how to extend the idea of "mean" to Riemannian manifolds, and how 
that depends on the choice of a metric. Section |IV] discusses several metrics one can use for 
the manifold of Gaussian distributions. After the obvious metrics are discussed (which lead to 
averaging covariances, information matrices, etc.), a non-trivial Riemannian metric is introduced 
that is shown to be the most most natural when dealing with Gaussian distributions, or, in general, 
when considering the intrinsic properties of the set of positive definite matrices. These different 
metrics lead to different critical probabilities, or no critical probability at all. 

Notation: All matrices are assumed to be real. Let A* be the transpose of the matrix A, 
let Tr(A) be its trace, and {Aj(A)} its eigenvalues. Let G\j{n) be the set of n x n invertible 
matrices; let 0{n) be the set of orthogonal matrices; let S(n) be the set of symmetric n x n 
matrices; and let T^n) C §{n) be the set of positive definite matrices. Let Sin) be the manifold of 
Gaussian distributions on M", and So(^) C 9(^) the submanifold of Gaussian distributions with 
mean 0. An element of 9(^) is denoted as P), where the mean /x G and the covariance 
P G J'(n) serve as coordinates on S{n). Let ||-|| be the operator norm (||A||^ = Amax(AA*)), and 
let II "lip, be the Frobenius norm (||A||5, = Trace(AA*)). For P G ^(n), let VP be the unique 
matrix in '^{n) such that (V^)^ = P. AH inequalities between matrices are to be interpreted 
in the Lowner partial order: Pi > P2 iff Pi — P2 is semidefinite positive. 

n. Motivating example 
Consider the discrete-time linear dynamical system 

x{k + l) = Ax{k) + Buj{k), 
y{k) = Cx{k) + e{k), 

with X G M", a; G W, y G and A, B, C real matrices of appropriate sizes. Assume u:{k) and 
e{k) are white Gaussian sequences with zero mean and covariance matrix equal to the identity, 
and that the initial prior for a;(0) is Gaussian with mean x{Q) and covariance P(0). Moreover, 
assume that the observations are available randomly, i.e., one has available the observations 
y'{k) = 'y{k)y{k), where 'y{k) is a sequence of independent Bernoulli random variables, such 
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that P({7(A;) = 1}) = 7 and F{{j{k) = 0}) = 1—7. The conditional estimate of x{k), given 
the available observations until time k is still Gaussian [6], and is indicated by the mean x(k) 
and the covariance P(A;). Define the error estimate e{k) = x{k) — x{k). Then e{k) has a 
Gaussian distribution with mean and covariance P(A;). This is the setup considered in [6] and 
is henceforth called Linear/Gaussian/BemouUi (LGB); the name "Kalman filtering" is not used 
because the results are independent of the particular representation of the optimal filter. 

Let Q = BB* and X = C*C. If the observations are always available (7 = 1), the evolution 
of P(A;) is deterministic and obeys the recursioijl] 

^:Ph^ ((APA* + Q)-i+X)"'. (1) 

If (A, B) is stabilizable and (A, C) is detectable, then g has a fixed point Poo to which P(A;) tends 
regardless of the initial value P(0) [3]. The Kalman filter and analogous variants implement the 
recursion with different representations for P, and faster and more numerically stable algorithms 
than ([U), which is used in the present analysis for convenience and compactness (apply one of 
the matrix inversion lemmas to obtain the usual Riccati recursion). 

If 7 G (0, 1), the evolution of P(fc) is not deterministic anymore. A convenient way to represent 
the evolution of P(A;) is in the form of an Iterated Function System [12] 

\g: Ph^((APA* + Q)-i+X)-\ = 7, 

S=l (2) 

yh: P^APA* + Q, p^ = l-7. 

A stationary distribution for P exists for all values of 7 (under much more general conditions 
than Bernoulli observations) [13]. In the following, the stationary distribution is referred to as 
the Linear/Gaussian/Bernoulli (LGB) distribution, and P refers to the stationary variable. 
Sinopoli et al. [6] showed that there is a threshold 7^ such that, for 7 < E{P} is unbounded. 

'Note that this paper uses the posterior covariance matrix (P(fc) = Pfc|fe = cov(a;(A:) — a;(fe)|i/'(l), . . . ,y'{k)). The a-priori 
covariance Pfc|j._i and a-posteriori Pfe|fc are linked by the simple relation Pfeifc-i = APfc_i|fe_iA* + Q hence there is no loss 
of generality for investigating the boundedness of the stationary distribution: E{Pfe|fc_i} is bounded if and only if EjPfcH;} is. 
Using the posterior covariance matrix seems a better choice for LGB filtering because, when written with information matrices 
(Y (AY^^A* + Q)^'^ +2') the difference between the maps g and h is the constant term X [11]. 



June 9, 2009 



DRAFT 



This is equivalent to say that the square of the Euclidean norm of the estimation error, E{||e||2}, 
is unbounded. The threshold 7^ depends non trivially on the parameters of the system, and a 
precise characterization is object of current research [6]-[9]. 

The way this result is often interpreted is that for 7 > % the behavior of the system is 
qualitatively different than for 7 < 7^, and therefore 7^ is called "critical" probability. However, 
if one considers another measure of performance, the critical probability changes, as described 
by the following result. 

Proposition 1: Consider the problem of LGB filtering in the scalar case, with A = a > 1, 
X = X>0, Q = Q>0. Then the expected squared error E{e^} and the expected error IE{|e|} 
have two different critical probabilities: 

1) E{e^} is bounded if and only if 7 > 7^ = 1— l/a^. 

2) E{|e|} is bounded if and only if 7 > 7'^ = 1 — l/\a\. 

Proof: It is convenient to define two others systems, which are respectively an "optimist" 
and a "pessimist" approximations to the iteration defined by The stationary distribution has 
support in the set {P|P > Poo}- The optimist approximation simplifies the map g to a constant 
by considering the best scenario P = Poo: 



^opt : Popt ^ Wopt, Wopt = ((a^Poo + Q)-' + 



5 



Sopt = \ (3) 

^opt '■ Popt ^ Ct^Popt + Q- 

The pessimist approximation considers the worst case (P 00): 



Qpes '■ Ppes ^ Wpes, Wpes — X 



5pes = { (4) 
^pes • Ppcs ' ^ Ppcs ~l~ Q- 



The two systems are identical except for the "reset" values Wopt and Wpes after an observation 
is received. It is straightforward to check that, if Popt(O) = P(0) = Ppcs(O) and the three systems 
see the same sequences of observations, Popt(fc) < P(^') < Ppcs(^)- Therefore, for the stationary 
variables, E{Popt} < E{P} < E{Pp,J and El^I^} < E{VP} < Ej^/p;;;;}. 
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Obviously E^^ e{e^(A;)} = P{k), by the definition of covariance and the fact that ]Et^,e{e(A;)} = 
0. Moreover, in the case of a Gaussian distribution, one can show that E^^ e{ |e(/c) | } = ^/2/'n-y^F{k). 
Therefore, upper and lower bounds for E{e^} and E{|e|} can be found as 

E{Popt} < E{e2} < E{Ppes}, (5) 

V^E{v^} < E{|e|} < v^E{v^}. (6) 

The rest of the proof estimates the terms in these expressions and is inspired by some ideas 
in [14]. The pdf for the stationary distribution for the two IFSs dl])-® can be computed in closed 
form. Consider, for example, the IPS in ([3]). The value of Popt at time k can be written in closed 
form as a function of r(A;), the number of steps that passed without receiving an observation 
(T{k) = if the last observation was received): 

T(fc)-1 

Popt(fc) = (a')^('^)Wopt+ Yl 

i=0 

Assuming independent arrivals, r(A;) has the probability distribution P ({^"(A;) = j}) = (I—^Yj. 
The expected value EjPopt} can be computed as Yl'^o'P {{^{k) = j}) Popt('r(A;)), giving 

IE{Popt} = 7 (W.p, + f: K(l - 7)]^' - 

The series converges, and E{Popt} is bounded, if and only if a^(l — 7) < 1 (as already proved 
in [6]). Analogously, the expected value E{a/H^} can be computed as 

E{ v/IV} = tE ((1 - ^)H)' \/(wop. + ^)-p)J^. 

The series converges, and Eji^Popt} is bounded, if and only if |a|(l — 7) < 1. 

Because the proof did not rely on the value of Wopt, the same convergence critical values are 
valid for the pessimist approximations E{Ppes} and E{ ^/Ppcs} as well. By taking into account ^ 
and (l6l), we see that a^(l — 7) < 1 is a necessary and sufficient condition for boundedness of 
E{e^}, and likewise |a|(l — 7) < 1 for boundedness of E{|e|}. ■ 
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Because 7^ > 7'^, there is a range of values (T'cjTJ such that E{|e|} is bounded, but E{e^} 
is not. The value 7'^ is as least as "critical" as The goal of this paper is not to advocate 
the use of the boundedness of E{||e||} rather than E{||e||^} as a criterion of stability; rather, 
it is of more interest to discuss what are the assumptions behind using one or the other. Is 
one "intrinsically" more correct? Other that in Kalman filtering with intermittent observations, 
similar questions arise in other problems where one must compute an "average" accuracy [5]. 
In general, the expected value is not invariant to change of coordinates, so a different average 
accuracy is obtained if one considers the average of covariances, of standard deviations, or of 
information matrices. 

Some answers to these questions can be found by setting the problem in a geometric frame- 
work. In particular, instead of considering the set y(n) of co variance matrices as a subset of 
M"^", one can consider, more abstractly, the manifold S(^) of Gaussian distributions. The next 
section shows how the operation of expected value E{ } can be generalized to Riemannian 
manifolds, such that one can define a "Riemannian mean" M{ } independently of the choice of 
coordinates. 

III. Means on Riemannian manifolds 

Classical mathematical statistics [15] developed in the first decades of last century in the 
context of Euclidean spaces. Subsequently, it became clear that many applications would benefit 
from rigorous coordinate-free approaches to statistics on manifolds. Examples of such applica- 
tions and corresponding manifolds include robotics [16] (motion groups), shape analysis [17] 
(size-and-shape spaces), radar imaging [18] (Grassman manifold), diffusion tensor magnetic 
resonance imaging [19], [20] (positive definite tensors), and Lie groups in general [21]. This 
section recalls the definition of Riemannian mean on manifolds; the reader is assumed to be 
familiar with basic differential geometry (e.g., [22]). 

Let X be a random variable taking values in with joint cumulative distribution function 
The expected value of X (or Euclidean mean, or simply mean) is defined, in the most general 



June 9, 2009 



DRAFT 



8 

terms, as the Lebesgue-Stieltjes integral 

E{X}= [ xdfi{x). (7) 

This definition is not directly generalizable to manifolds because it assumes that the set has a 
vector space structure. However, the mean satisfies a variational property, being the point that 
minimizes the quadratic risk 

E{X} = argminEjllX-ylla} . (8) 
y 

Definitions (|7]) and ([8]) are easily seen to be equivalent in the case of vector spaces. The second 
definition has the benefit that it can be generalized to any metric space. 

In particular, it can be generalized to Riemannian manifolds. Recall that a Riemannian man- 
ifold (M, g) is a differentiable manifold M equipped with a smooth metric g on the tangent 
space. The "length" £(7) of a differentiable curve 7 : [0, 1] ^ M is defined by 

%)= l\g{i{t),i{t))dt. 
Jo 

Given the notion of length, one defines the distance between two points mi,m2 G M as 

d{x,y) = inf{£(7) | 7is a differentiable curve joining x and y}. 

Consider a Riemannian manifold (M, g) with corresponding distance d. Generalizing ([8]) for 
a random variable X taking values in M, define the Riemannian mean (also called: Rieman- 
nian barycenter, Riemannian center of mass, Frechet mean or Karcher mean) as the point that 
minimizes the average quadratic distance: 

M{X} = arg irrf E {d'^{X,y)] . (9) 

The Riemannian mean is unique for a simply connected manifold of non positive sectional 
curvature [23] — as counterexamples, the reader may consider the distribution consisting of a 
pair of antipodal points on the unit circle (a non-simply connected, zero curvature manifold) 
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and on the unit sphere §^ (a simply connected, positive curvature manifold). See [24] for an 
alternative characterization of the Riemannian mean using the inverse of the exponential map, 
and, more in general, see [18] for a short introduction to modern intrinsic estimation on manifolds. 

IV. Different metrics for the manifold of Gaussian distributions 

The random availability of the observations in the LGB filtering setup induces a stationary 
distribution on S(^), the manifold of Gaussian distributions on R", for the estimate (a;,P). In 
particular, the estimation error e = x — x has a Gaussian distribution with zero mean, therefore 
we focus on the submanifold So(^) of Gaussian distributions with mean 0. The Riemannian 
mean depends on the choice of a metric, and this section considers several such options. In the 
following, for compactness of notation, sometimes we confound So('^) with 'J'{n), for example by 
writing the distance between Gaussian distributions (i(^(0, Pi), ^(0, P2)) directly as (i(Pi, P2). 

A. Flat metric for covariances 

The traditional way to represent a Gaussian distribution is by using its mean and covariance, by 
identifying ^{n) with M" x !P(n). This is what most consider to be the "natural" representation. 
Considering y(n) as a convex cone of M"^" seems also to fit very well with the operation of ex- 
pected value, because the expectation is nothing other than a glorified convex linear combination; 
the convexity is also useful in optimization, for example in semidefinite programming [25]. 

From this point of view, y{n) inherits the Euclidean metric of M"^". This is the metric 
implicitly used by Sinopoli et al. [6]. The distance between two Gaussian distributions reduces 
to the Frobenius distance between the two covariance matrices: 

d{g{o,p,),g{o,P2)) = ||Pi -P2IIF. (10) 

For the Riemannian mean, one obtains that the covariance of the mean distribution is the expected 
value of the covariances: M{^(0, P)} = ^(0, E{P}). Note that this mean is affine-invariant, i.e. 
invariant to a change of coordinate P 1-^ APA* for A G GL(?7,), but the distance (ITOl) is not. 
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B. Flat metric for information matrices 

The Information filter [3] utilizes a different parametrization (r/, Y) to represent a Gaussian 
distribution (77 = P^^a; and Y = P^^); this gives a different embedding of S(ri) in M" x y{n). 
One could make the argument that information matrices are a more natural parametrization for 
Gaussian distributions: in the canonical representation of Gaussian distributions as an exponential 
family [15], the information matrix is the natural parameter; in fact, one writes the probability 
density function using P'^. With this choice, the distance between distributions is given by 

rf(6^(o,Pi),6;(o,P2)) = i|Pr'-p^'llF. 

The Riemannian mean is M{^(0,P)} = ^(0,E{P"^}-i). It is easy to see that, in the Lin- 
ear/Gaussian/Bernoulli case, M{^(0,P)} exists for all values of 7 G (0,1], because P^^ is 
bounded by As in the previous case, the mean is affine-invariant, but the distance is not. 

C. Flat metric for square root of covariances 

The matrix equivalent of the scalar standard deviation is the square root of the covariance 
matrix; the eigenvalues of \/V are the standard deviations. The distance between distributions 
can be defined as 

d{g{{), Pi), e;(o, Pi)) = 1 1 - v^al If, 

and consequently the Riemannian mean is M{^(0, P)} = ^(0, E{-\/P}^)- By Jensen's inequality 
and the fact that P ^ \fP is operator-concave, it follows that E{a/P}^ < E{P}. Thus the 
Riemannian mean for this distance exists in all cases when E{P} exists; moreover, as shown 
by Proposition [H in some cases, the critical probability for boundedness of E{V^} is strictly 
less than the critical probability for E{P}. 

D. Fisher Information Metric 

The problem we are analysing is special in two regards: 1) we are concerned with doing 
statistics on a certain manifold S(ri); and 2) the elements of the manifold represents 
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probability distribution themselves. The branch of statistics that studies the properties of the 
families of probability distributions considered as a manifold is called information geometry and 
is a relatively recent development with respect to classical mathematical statistics [26], [27]. 

From this point of view, the manifold has a natural metric given by the generalization of 
the Fisher Information Matrix (FIM) as a Riemannian metric. We recall the definition of the 
FIM in the Gaussian case [15]. Assume that the available observations z have a Gaussian 
distribution whose mean and co variance are parametrized by an unknown parameter G M": 
z ~ Q{fi{0), Ti{6)). The FIM for 6 is the n x n semidefinite positive matrix T[6] defined as 

The FIM gives the information contained in the samples about the value of 6; for example, 
using the FIM one defines the Cramer- Rao Bound for unbiased estimators as cov(^) > X[0]^^. 
The FIM can be generalized to be a Riemannian metric for the manifold 9(^)- If we restrict to 
the submanifold So{n), given two elements X, Y G S(n) in the tangent space at ^(0,P), the 
Fisher Information Metric is 

g{X,Y) = ^Tr{p-'XP-'Y}. (12) 
Compare ([T2l) with the second term in (fTTI) . The distance induced by this metric is (e.g., [19]): 



i=l 



1/2 



(13) 



This distance is "natural" in the sense that it is linked to the probability of distinguishing the 
two distributions ^(0,Pi) and Q{0,P2) by observing their samples, in a sense which is made 
precise in [26]. Unfortunately, a closed form expression for writing M{^(0,P)} is not known. 

The use of this natural distance on Soin) ~ 7(n) allows to show that some naive results 
obtained using the flat metric on covariance are incorrect [18]. For example, in basic mathematical 
statistics courses, one teaches that, given a set of samples {xi}^^^ from a distribution with 
covariance P, the bias-corrected sample covariance matrix P^c = J2i=ii^ ~ Xi){x — Xi)* 
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is an unbiased and efficient estimator of P, i.e. E{Psc} = P and P^c reaches the Cramer- 
Rao bound. However, it is also well known that Psc performs poorly at low sample support. 
Smith's [18] explanation to this conundrum is that P^c is not unbiased according to the natural 
metric: M{P,J ^ P. 

Ignoring the Fisher Information Metric interpretation, the distance defined by (fT3l) is also natu- 
ral for when it is considered either as a symmetric space, or as the quotient space GL(n) /0{n) [18]. 
The distance has several other useful properties in the context of LGB filtering. (!P(n),(i) is a 
complete metric space [13] and a geodesically complete manifold with nonpositive curvature [20]. 
The distance d induces the usual topology [13] on The distance is invariant to affine 

transformations, and also to inversion P i-^ P^^; this last property is useful because one can 
use either covariance matrices or information matrices: M{P} = M{P^^}^^, which is not true 
if one uses the expected value. Using this distance it is also easy to show contraction properties 
for the Riccati iterations g, h that guarantee the existence of the stationary distribution [13]. 

It is possible to show that there is no "critical probability" if one uses this metric. To this 
end, one should first prove that the system has a stationary distribution for all values of 7 > 0. 
This is done in the next section. Then, in Section |IV-F[ it is proved that the Riemannian mean 
of this distribution exists. 

E. Existence of the stationary distribution 

In this section, we prove the existence of the stationary distribution, for every value of 7 > 0. 
This can be done by using some results from Bougerol [13] regarding the contraction properties 
of the maps h ando, and some results from Bamsley et al. [28] about the convergence of Iterated 
Function Systemfl Once these results are recalled, the conclusion will be immediate. 

We need some preliminaries from [28]. Let (X, rf) be a complete metric space. Let fi, 
i = 1, . . . ,n he Lypschitz functions from X to X, that is, there exists Sj > such that 

d{fi{x),fi{y)) < Sid{x,y) for all x,y in X. We say that fi is "nonexpansive" if Sj < 1, and 

^Because this paper is not available electronically yet, the results are stated here extensively. 
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we say that it is a "strict contraction mapping" if Sj < 1. Assign a set of probabilities Pi, to 
these functions, such that Pi > and X^iLi^** ~ Consider now the Iterated Function System 
{{fi,Pi)} and the corresponding Markov chain, which we denote {Zi},i > 0. We say that a 
measure /i is "attractive" if, for every initial distribution of Zq, the process Zi converges in 



on X. It is an intuitive result that, if all the fi are strict contractions, then the process "tends to 
forget" the initial conditions, and a stationary distribution exists. What is not trivial is that IFSs 
converge in distribution with much weaker hypotheses, as shown by the following result. 

Theorem 1: (Bamsley et al. [28]) Suppose that the fi satisfies an average contractivity con- 
dition as follows: for all x^y E X, 



Then there exists a unique, attractive invariant probability measure for the IFS. 

Remark 1: Note that it is not assumed that the single fi are strict contractions (Sj < 1) or 
even contractions (sj < 1). This theorem can also be generalized to the case in which the 
transition probabilities depend on the state {pi = Pi{x)), although some additional hypotheses 
are required [29]. 

We now recall the following from [13]: 

Lemma 1: (Bougerol [13]) In the metric d defined by ([T3l) . 

1) The maps h and g are nonexpansive mappings: d{h{Pi), h{P2)) < d(Pi, P2), and equiv- 
alently for g. 

2) If A is non-singular, (A, B) is controllable, (A, C) observable, the composition g^ = 
g o ■ ■ ■ o g of n copies of is a strict contraction mapping; that is, there exists p = 



From these results, the following result is easily proved. 

Proposition 2: If A is non-singular, (A, B) controllable, (A, C) observable, then the station- 
ary distribution for P exists for all 7 > 0. 



distribution to /i, that is, limj^oo IE{/(Zi)} = J f dp for every bounded continuous function / 




(14) 



p(A,B,C) < 1 such that d{g{Pi),g{P2)) < pd{P^,P2). 
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Proof: Consider the behavior of the system at intervals of n steps. This corresponds to 
considering the "power" IFS 5" = {(^'',7"), (^o /i""\7(l -7)""^), • • • , (Z^'', (1 -7)")}, which 
is created by all 2" possible combinations of length n of the functions h, with corresponding 
probabilities. By Lemma [B g" is a strict contraction, and all the other combinations are non- 
expansive mappings. Therefore, assuming 7 > 0, the system satisfies the average contractivity 
condition (fT4l) . and by Theorem \T\ the stationary distribution exists. ■ 

F. Existence of the Riemannian mean for Fisher Information Metric 

After having ascertained that the stationary distribution exists (Proposition O, we now prove 
existence of the Riemannian mean. 

Proposition 3: The Riemannian mean of the LGB distribution for the distance (fT3l) exists for 
all 7 G (0, 1]. 

Proof: The stationary distribution of the IFS S = {{g,^),{h,l —7)} is equivalent to 
that of the power IPS 5" = {((/", 7"), o 7(1 - 7)""^), • • • , (Z^", (1 -7)")}, obtained by 
considering compositions of length n of the functions {g, h) with corresponding probabilities. 
We now build the IFS iSp^g, a "pessimist" approximation to 5". By recalling that g and h are 
order-preserving, and 5'(M) < /i(M) for all M [6], one can bound all mixed terms in g, h 
in 5" by Furthermore, one can also find an upper bound for g'^: because the system is 
observable, the uncertainty is bounded over all the state space after n consecutive observation 
are received. Therefore, Wpes = supp>p^ 5'"(P) exists and is bounded: Poo < Wpes < 00. Thus 
the pessimist approximation to 5" is S"^^^ = {(Wpes, 7"), (^", 1 — t")}- Seeing the stationary 
variables P (for iS") and Ppes (for S^^^ as functions of the past infinite sequence of arrivals 
u = {7(0), 7(-l), 7(-2), . . . } G {0, l}^ one has that 

Poo < P(^) < Ppcs(^). (15) 

To prove that M{P} is bounded for all 7, it is sufficient to show that the minimization 
problem ^ is feasible for all 7. To prove this, it is sufficient to show that E{(i^(X, P)} is 
bounded for some matrix X; it is convenient to choose X = Pq^. By (fTSl ) and Lemma |2] below, 
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we obtain that d{P^,P{uj)) < d(Poo, Ppes(^)) and thus E{d^{P^,P)} < E{d\P^,Pp,,)}. It 
follows that M{P} is bounded if M{Ppes} is. 

We now investigate boundedness of ¥.{d^(Poo, Ppes)}- Choose an M such that MPqoM* = I 
and do a change of coordinates P ^ MPM*. One finds that E{d^{Poo, Ppcs)} = E{d'^{I, P^J}, 
where Pp^^ = MPpcsM* > I. Note that because KiP^cs) > we can find the bound d'^{l, Pp^J = 
^"^^ log^ (Ai(Ppgg) < n log^ (||Pppg||). An expression for Ppes(fc) can be written explicitly as 
in the proof of Proposition [T] as a function of T{k), the number of steps passed without receiving 
an observation (recall that one time step in the IFS 5" corresponds to n steps of S): 

nr(fc) — 1 

Ppes(fc) = A"-('=)Wp,, (A*)'^-(^) + £ A^Q(A*y. 

From this one finds the bound ||Ppcs(A;)|| < ci || All^""^^^-* forsomeci > O.Thusrilog^ (||Ppes||) — 
T{kYc2 + T{k)cs + C4 for some C2, C3, C4 > 0. As in the proof of Proposition [H the expectation 
can be computed with respect to T{k), and one obtains 

00 

E{d\p^, p)}<r 5^(1 - ry [*'c2 + + . 

i=0 

Series of the kind Xli^o^'^^* ^^'■^ A; > are convergent if \x\ < 1, hence E{(i^(Poo, P)} is 
always bounded if 7 G (0, 1]. Therefore, the Riemannian mean is always bounded. ■ 

Lemma 2: For the distance defined in Pi < P2 < P3 =^ c?(Pi, P2) < d{Pi, P3). 

Proof: (sketch) First, reduce to the case Pi = I by letting P^ = MPjM*, with M chosen 

such that MPiM* = I. Then verify d{I, Pg) < d{l, P3) by direct computation using (O. ■ 

After one has proved the existence of M{P}, uniqueness follows from the fact that the manifold 

has nonpositive curvature [23]. 

V. Conclusions 

Algebra is the offer made by the devil to the mathematician. The devil says: T 
will give you this powerful machine, it will answer any question you like. All you 
need to do is giving me your soul: give up geometry and you will have this marvellous 
machine.' [30] 
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— Sir Michael Atiyah (1929-) 
The righteous engineer must refuse the devil's offer. Reframing problems in a geometric frame- 
work usually allows to spot the hidden assumptions, and therefore to check whether the results 
have a physical meaning, or they are just figments of the mathematical formalization. 

The hidden assumption in the work of Sinopoli et al. is that positive definite matrices are 
treated as a convex cone of M"^". This is perhaps the most intuitive interpretation, and has good 
consequences in certain contexts, such as semidefinite programming [25]. However, this could 
lead to incorrect conclusions when doing rigorous intrinsic estimation. This is well shown by 
Smith's example [18], that even though the sample covariance matrix is unbiased in the naive 
sense (E{Psc} = P), it is biased in the intrinsic sense (M{Psc} 7^ P), thereby contradicting 
what is taught in elementary statistics courses. 

In the case of the Linear/Gaussian/BemouUi filtering problem, if one uses the average standard 
deviations, instead of the average covariances, one obtains a different critical probability (Propo- 
sition [T]). It is pointless to discuss which critical probability is more critical than the other, but 
surely considering the average error is more natural than considering the average error squared. 
The point is that the boundedness of the expected value cannot be considered as a criterion 
for "stability"; there are plenty of well-behaved probability distributions which have infinite 
moment]^. 

Sinopoli's critical probability is critical only in the sense that it is the threshold under which 
E{P} ceases to be meaningful as a performance measure. Under the threshold, the qualitative 
behavior of the system does not change, as one can see from considering the Riemannian mean 
derived from the intrinsic Fisher Information Metric (Proposition [3]) — that is possibly a less 
intuitive, but more natural way to represent the concept of "average uncertainty". Thus the 
intense research effort in trying to characterize 7^ seems misplaced; of more interest is studying 
the entire LGB distribution [11], [14]. 

^ Consider as an example the Pareto distribution, defined as P({X > x}) — x~'' for a:: > 1, fc > 0: for this distribution, 
E{X} is bounded only if fc > 1, and E{X^} only if fc > 2; yet it has a regular power law. All the statistics that remain finite 
change smoothly when k goes through the "critical" values 1 and 2. 
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